[<<Previous Entry]
[^^Up^^]
[Next Entry>>]
[Menu]
[About The Guide]
study(SCALAR)
study SCALAR
study Takes extra time to study SCALAR ($_ if unspecified)
in anticipation of doing many pattern matches on the
string before it is next modified. This may or may
not save time, depending on the nature and number of
patterns you are searching on, and on the distribu-
tion of character frequencies in the string to be
searched--you probably want to compare runtimes with
and without it to see which runs faster. Those
loops which scan for many short constant strings
(including the constant parts of more complex pat-
terns) will benefit most. You may have only one
study active at a time--if you study a different
scalar the first is "unstudied". (The way study
works is this: a linked list of every character in
the string to be searched is made, so we know, for
example, where all the 'k' characters are. From
each search string, the rarest character is
selected, based on some static frequency tables con-
structed from some C programs and English text.
Only those places that contain this "rarest" charac-
ter are examined.)
For example, here is a loop which inserts index pro-
ducing entries before any line containing a certain
pattern:
while (<>) {
study;
print ".IX foo\n" if /\bfoo\b/;
print ".IX bar\n" if /\bbar\b/;
print ".IX blurfl\n" if /\bblurfl\b/;
...
print;
}
In searching for /\bfoo\b/, only those locations in
$_ that contain 'f' will be looked at, because 'f'
is rarer than 'o'. In general, this is a big win
except in pathological cases. The only question is
whether it saves you more time than it took to build
the linked list in the first place.
Note that if you have to look for strings that you
don't know till runtime, you can build an entire
loop as a string and eval that to avoid recompiling
all your patterns all the time. Together with
undefining $/ to input entire files as one record,
this can be very fast, often faster than specialized
programs like fgrep. The following scans a list of
files (@files) for a list of words (@words), and
prints out the names of those files that contain a
match:
$search = 'while (<>) { study;';
foreach $word (@words) {
$search .= "++\$seen{\$ARGV} if /\b$word\b/;\n";
}
$search .= "}";
@ARGV = @files;
undef $/;
eval $search; # this screams
$/ = "\n"; # put back to normal input delim
foreach $file (sort keys(%seen)) {
print $file, "\n";
}
This page created by ng2html v1.05, the Norton guide to HTML conversion utility.
Written by Dave Pearson